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Majority of the existing commercial application for video surveillance 
system only captures the event frames where the accuracy level of captures is 
too poor. We reviewed the existing system to find that at present there is no 
such research technique that offers contextual-based scene identification of 
outliers. Therefore, we presented a framework that uses unsupervised 
learning approach to perform precise identification of outliers for a given 
video frames concerning the contextual information of the scene. The 
proposed system uses matrix decomposition method using multivariate 
analysis to maintain an equilibrium better faster response time and higher 
accuracy of the abnormal event/object detection as an outlier. Using an 
analytical methodology, the proposed system blocking operation followed by 
sparsity to perform detection. The study outcome shows that proposed 
system offers an increasing level of accuracy in contrast to the existing 


system with faster response time. 
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1. INTRODUCTION 

The usage and technical adoption of the video surveillance system has been increasing in faster pace 
owing to the increasing security concerns [1]. At present, there are various processing techniques of video 
that has significantly benefited the computer vision strategies to a great extent [2], [3]. Although, existing 
system are capable of capturing the high definition video as well as transmit the high-definition video frames 
over wireless links, but they also suffers from some significant pitfalls [4], [5]. The most prominent issues in 
the existing system are to perform identification of the outliers that may come in different shape and form of 
an object present in the scene with respect to its context [6]. The meaning of the outlier pertains to presence 
of objects or events that has highly less probability to occur with respect to the given context of the scene. 
Construction a framework in order to perform identification of such events or objects is quite challenging 
especially considering unsupervised manner and hence it has drawn the attention of the research 
communities. The existing approaches called for using spatial and temporal factors [7], [8], optical flow 
[9-11], background model [12], Histogram-based [13], [14], behavioral-based template matching [15], [16], 
etc. From practical implementation viewpoint, the extraction of feature as well as construction of framework 
is quite significant for an efficient identification of outliers for a given video frames. Therefore, the proposed 
system offers a unique framework that utilizes unsupervised learning mechanism for developing an 
involuntary system for detecting abnormal events. We explored that the set of feature-based attributed 
utilized in existing research work are not capable enough to perform representation of complex contextual 
behaviour of the video frames. Existing system perform better capturing of information related to gradient 
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that is better effective for resisting changes against illumination and appearance. However, we have also seen 
that video frames are more likely exhibit the characteristics of gradients and hence such existing system may 
ignore a voluminous amount of significant contextual information. The existing systems are also found to 
have lesser dependencies on data in such a manner that it is not feasible to use the information specific to 
particular task in the dataset. The proposed study is highly motivated by the advancement in usage of feature 
towards identification problems [17], [18] and thereby presents a framework that uses matrix decomposition 
principal in order to optimize the learning process of videos. We find that our proposed system is highly 
capable of capturing more relevant information with higher range of complication and therefore can harness 
lots of task-related information present in the dataset. This mechanism is used for extracting features. 
Therefore, in that context, it can be said that proposed mechanism can offer better performance in contrast to 
existing system. Another significant contribution of the proposed system is its usage of probability theory to 
perform computation of level of outliers present from the pixel-levels using block-based transformation 
process. For the purpose of resisting detection of too much of local values, the proposed system carry out 
appending of both temporal as well as spatial data that bears more contextual information. It is to be noted 
that proposed system uses unsupervised learning approach that is completely free from any form of human 
intervention with respect to both feature-based learning as well as framework-based learning. The study 
outcome shows better performance with respect to existing system and offers better computational 
performance while performing the process of identification. Section 1.1 discusses about the existing 
literatures where different techniques are discussed for detection schemes used in outlier localization in video 
surveillance system followed by discussion of research problems in Section 1.2 and proposed solution in 1.3. 
Section 2 discusses about algorithm implementation for accomplishing the proposed research goals followed 
by discussion of result analysis obtained in Section 3. Finally, the conclusive remarks are provided in 
Section 4. 


1.1. Background 

This section discusses the existing techniques towards the identification of significant events in the 
form of an outlier. Dutta et al. [19] have presented a framework using sparse coding for performing saliency 
detection as well as identification of outliers. Usage of saliency-based approach was also seen in the work of 
Jang and Park [20] towards identifying potholes from grayscale images. Wang et al. [21] have used localized 
histogram for analyzing crowded scene using supervised learning approach. Zhou and Torre [22] have had 
also adopted spatial as well as a temporal scheme for analyzing human poses using three-dimensional 
capturing model. Fu et al. [23] have presented a technique for identification of possible outliers framed from 
the annotation from the video. Li and Haupt [24] investigate the problems associated with the localizing the 
outliers in larger samples of data inflicted with noise. Xue et al. [25] have introduced a technique where the 
outlier’s detection is carried out by emphasizing on the estimation of foreground considering the sparsity 
constraint. Zhou et al. [26] have used a low-rank representation for identification of outliers of contiguous 
type. Gopalan et al. [27] have used learning-based methodology followed by feature extraction from pixel 
hierarchy and using particle filter to perform identification of the outliers from the traffic data considering 
lane markings. Abnormal behavior detection is also investigated over a facial data by Yang and Bhanu [28]. 
Ni et al. [29] have used principal component analysis along with mining-based approach inorder to identify a 
pecular pattern of age from social videos. Ammar and Lashkar [30] have presented a technique that performs 
diagnosis of the typical pattern of a sleep disease right from optical video flow. Identification of the outlier 
was also carried out by Choi and Choi [31] for assisting in fire-resistive application. Feris et al. [32] have 
presented a correction technique of lightning condition that significantly assists in show-based outlier 
detection. Jayasuganthi et al. [33] have modeled uniform background using Gaussian algorithm followed by 
segmentation and used k-means algorithm for performing video surveillance. Liu et al. [34] have used a 
sparse collaborative model for performing detection of outliers from a given video. Maurya and Toshniwal 
[35] have used supervised learning algorithm for training the data gathered from a nuclear power plant to 
identify set of outliers. The work carried out by Pang et al. [36] have presented a study where the extraction 
of features as well as clustering of data is adopted to perform detection of outliers from a public scene 
images. Similar clustering methodology was also adopted by Pritch et al. [37] on the video data to perform 
identification of abnormal events. Adoption of time-series for analysis its effect on the outlier detection was 
seen in the work of Teng et al. [38]. The work of Bayat et al. [39] discussed the detection of goal in soccer by 
using event detection mechanism and achieved better accuracy with less detection failure. A forgery 
detection model for mobile recorded and surveillance videos were presented in Staffy et al. [40]. This model 
found able to identify the tampering irrespective of video format. Teddy et al. [41] performed the 
performance analysis of automatic number plate identification over Android smartphone device and found 
effective recognition at 0.98s processing time. 
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Therefore, it can be seen that there has been various forms of techniques that has been evolved in 
most recent times for solving the problems associated with outlier detection. All the existing studies have a 
significant level of advantages as well as contributions. However, the existing studies are also associated with 
significant loopholes which are required to be addressed. The next section briefs about problems identified 
from existing literature. 


1.2. Research Problem 
The significant research problems are as follows: 
a. The existing technique of outlier detection has been constructed depending on a particular 
pattern of an object without considering the actual context of the scene. 
b. Usage of supervised learning approach increases the accuracy of the identification of an 
abnormal event but at the cost of computational complexity. 
c. Usage of prior information about the object and its types makes the existing system more 
narrowed to the specific research environment and became incompatible when it changes. 
d. The extent of false positives is more in the conventional techniques even where the sparsity 
coding has been carried out in order perform outlier detection. 
Therefore, the problem statement of the proposed study can be stated as "To design a framework that offers 
more contextual scene analysis to enhance the precision level of identification of outliers." The next section 
discusses the proposed solution. 


1.3. Proposed Solution 

The proposed study is a continuation of our prior work [42], [43] where the present solution targets 
to develop a framework for video surveillance system that is capable of identifying outliers for a given set of 
captured video frames. The primary consideration of this paper is that existence of outliers is never 
instantaneous and normally exists on the given scene considering both time factor and spatial factor where 
both these factors can also be stated as contextual factors. This can be empirically represented by 
Equation (1), 


A=A spar X Atime (1) 


In the above expression, the variable A represents an aggregation of local attribute where Aspar and 
Asime represent spatial and temporal attribute associated with the local feature. The idea of the proposed 
system is to compute local attribute a (a c A) for all the pixels to develop a local attribute Z; that is localized 
at the centroidal position for a given pixel. This process results in the generation of histogram 2(Z;), where n 
represents negative coefficient. Therefore, applying probability, the identification of the outliers prob(A) can 
be empirically expressed as, 


k=l 
prob(Z, =$,)=7, 1/7; (3) 


In the above Equation (2), Z,=(Xk-Xo,¥k-Yo.tk-to) can be considered to represent the position-based 
association of Z, with Z,. The above expression can further be split in the form of prob(z,|9;, j) to represent 
probabilistic selection for a position with time and spatial-based attributes where variable ¢ represents 
dictionary. Also, in the above empirical expression, prob(Z,=0,) is equivalent to correlational factor existing 
between X, and dictionary j as shown in Equation (3). Although the computation of prob(A) is slightly 
computationally complex process owing to its dependencies of histogram factors a for all the given value of 
A, it also offers capability to compute such forms of attributes and reutilize histogram attribute m for all the 
local attributes for different aggregates A of pixels. The construction of the histogram factor of negative 
coefficient z is empirically computed as shown in Equation (4) as follows: 


a(Z)= oa, (4) 


(spat,time )EZ 


One of the interesting facts to observe is that usage of above histogram attribute a is approximately 
equivalent to local attributes that offer significant amount of granularity by harnessing the minute 
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information from the given multimedia file (i.e., video) to offer better accuracy performance while 
performing an outlier identification. 

The next step of the study is to perform learning operation where the initial learning process is 
applied on selection of probabilities variable prob(m,|;, 6;). Applying learning operation is not challenging 
here as it can be directly executed over the trained data. The second learning process is applied concerning 
identification of cut-off u. The prime idea implemented in this part of the study is if all the aggregated points 
are observed in the form of time-based chronological data than the frequency of occurrence of outliers has 
significantly lower value compared to other data points. Therefore, we construct a statement that while 
monitoring the objects of the aggregates are found to satisfy the following condition shown in Equation (4) as 
decision making: 


prob(A)< u (4) 


If the above condition is found to be valid than the system decides that the monitored object is 
considered as outliers. To learn the cut-off u from trained data, a hard-coded attribute of probability of outlier 
could be selected by the requirement of the user. We consider that smaller value of such probability will not 
lead to efficient detection as it may have higher chances to generate false positives as only less number of 
outliers will be ignored. At the same time, a higher value of such probability should also be rejected as the 
outcome may not be practical in origin. This problem is avoided by performing sparsity-based learning 
approach in order to compute prob(A) for all the aggregates existing in the trained dataset for the purpose of 
evaluating the cut-off value of u to any particular value in such a way that proportion of aggregates satisfying 
the logical condition of prob(A)< u is a specific probability, i.e., prob(a). Whereac A. An analytical 
research methodology is applied to carry out the implementation of the proposed study. Figure 1 highlights 
the schematic diagram of the proposed system to identify any form of abnormal events as outliers from the 
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Figure 1. Schematic Architecture of Proposed System 


The proposed system is designed in two possible steps of (a) unsupervised training and 
(b) validating. The system uses local-level features in the preliminary level that is characterized by more 
sophisticated patterns. With the aid of probability theory, the proposed system assesses the level of outliers 
existing in given data using a blocking operation. An algorithm for extracting block is designed that takes the 
input of video frame which upon processing yields an output of extracted blocks. This block assists in further 
extraction of local-level information thereby constructing a good number of features with the aid of next 
algorithm of sparsity-based learning. Both the spatial as well as time-related information captured from the 
trained blocks are used to perform detection of outliers. 
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Finally, an algorithm for outlier detection is formulated. The significant contribution of the proposed 
system is as follows viz. 
a. Construction of an analytical framework for feature extraction from a given video set 
b. Incorporates blocking operation for further granularity in the feature extraction process 
c. Using a dictionary for assisting in the better identification process. The significant level of the 
study contribution is that proposed system is completely capable of solving any form of 
multivariate problems existing in the case of video surveillance system. Therefore, the context of 
the scene is understood well, and all the detailed information is significantly captured by the 
proposed framework. The utilization of the proposed framework is more on abnormal object 
behavior in any environment. 
The next section discusses the algorithm implementation followed by a discussion of the outcome 
obtained from the study. 


2. ALGORITHM IMPLEMENTATION 

The prime purpose of the proposed algorithm is to perform a precise identification of the outliers 
from the video frames. However, this research aim is carried out considering the formulation of three 
different algorithms where they are responsible for extracting blocks, applying sparsity for implementing 
learning strategy, and for performing outlier detection. All the algorithms are constructed in a sequential form 
and hence are respectively illustrated sequentially. The steps involved in the algorithm-1 are as follows: 


Algorithm-1 for Extraction of Block 
Input: f (number of frames) 

Output: Baic: (Extracted block) 

Start 

. init f 

. [nr, nc, k]>size (f) 

A B>blocksxs (f) 

. na> prod (BS) & na size (B), obs >BS,*BS>2 
. Baa > mean[(nn-1)*obs+1:nn*obs] 
. For j=1:size(Baicr) 

v=v-m(j) 

a=v/norm(v) 

Baia 

10. End 

End 


The algorithm-1 initially takes the video frames f as the input (Line-1), which is followed by the 
number of operations in the consecutive steps for performing block computation. The size of the frame f is 
then mapped into three variables number of rows nr, some columns nc and index k (Line-2). The next step is 
to convert the pixel elements into columnar form to divide the frame into distinct 5x5 block B (Line-3). Two 
variable n,a and n.g computes the number of rows and columns for dictionary respectively along with 
computation of one block size obs (Line-4). Finally, the dictionary is created considering nrd, ncd, and the 
number of frames for training divided by block size. For all the sizes of the dictionary-based blocks (Line-6), 
all the frames are considered, which are then further divided into 5x5 distinct blocks. Finally, the dictionary- 
based blocks are computed as shown in Line-5 to obtained Baic, i.e., dictionary-based block. A loop is 
created as shown in Line-6 for all sizes of Baic to compute vector v=Baie(j) and m; represents the mean value 
of Baic (Line-7) that finally leads to the generation of extracted blocks B gi-t as the outcome. 

After the blocks are extracted from the given frames, the proposed system implements a novel form 
of matrix decomposition to perform multivariate analysis using sparsity-based learning process. The steps of 
an algorithm-2 for sparsity-based learning and steps of an algorithm-3 for outlier detection are given. 

The above-mentioned algorithm-2 first initializes the index of the matrix k to set dimension of the dictionary 
for applying it to multi-variate analysis to it (Line-1). A structure is maintained for the dictionary followed by 
formation of a loop as shown in Line-3. A matrix Dt is created for storing all the dictionary-related values 
within itself (Line-4) followed by the creation of a super-index X (Line-6) that maintains all the multi-variate 
matrices of Dt. The next step is to apply a function ọ that performs sparse matrix decomposition using linear 
algebra (Line-8) over the matrix index k and super-index X. As the outcome of this matrix is always positive, 
therefore, it is easier for computing the resulting matrix. The outcome of this algorithm results in the 
generation of multiple features, e.g., X (matrix with mixed signs), A (basis matrix), and Y (coefficient 
matrix). The dictionary evolved from this algorithm in Line-8 will be reused for performing identification of 
the outliers for the test frames. After the features have been extracted from the multivariate analysis concept 
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used in the learning phase, the next part of algorithm implementation will be to perform identification of the 
outliers. 


Algorithm-2 for Sparsity-based Learning 
Input: k (matrix index), Baie (Extracted block), D (Dictionary) 
Output: extracted features (X, A, Y) 
Start 
1. initk 
2. get Baia, 
3. For i=1:size(Dict) 
DeDict(i) 
For j=1:size(Dt) 
X()€Dt() 
End 
8. Apply o(X, k)>Dictionary(X, A, Y) 
9. End 
End 


TO 


The input to algorithm-3 for outlier detection is the dictionary that has been formed in its prior algorithm 
(Line-1). All the images, as well as ground truth images, are considered in this study phase. It is followed by 
implementation of first algorithm for block extraction (Line-2), where similar steps, e.g., creation of 
dictionary of 5x5 block size, reading the test frame, dividing the frames into 5x5 blocks, computation of 
dictionary-coefficients (n,a, Nea, and obs), computation of mean of Baic, and estimation of dictionary-based 
vector v (Line-3). A new matrix Mpict is formulated by-product of a dictionary of all sizes of B a4ict and super- 
index (line-4). A distance computation of the two linear points is computed using probability prob between 
Mgict and normalized vector a followed by computation of final probability, i.e., Eprob (Line-6). The algorithm 
then estimates the minimum value of E,,,, followed by final computation of the final feature Dim that 
transforms columnar information in the form of an image (for highlighting the outliers). Finally, using the 
information restored from ground truth image, the system checks if the final feature Dim is more than 1 
(Line-8). It is important to understand that ground truth images play a significant role in outlier detection 
where first the ground truth GT images are read and converted to binary images. This process is followed by 
applying another function Q which is meant for performing outlier detection by ground truth image GT, 
binary image bw for the condition mapping with Dim>1, and final feature Dim. The function Q is designed in 
following steps: first the region-based properties of ground truth image GT is estimated based on centroidal 
factor of region followed by concatenation of it with centroid and followed by dilation operation of 
morphology in order to obtain Dim and bw values. Implementation of the function Q leads to the formation 
of probability map Probmap, which is checked for its cut-off value in order to identify the outliers object 
(Line-11 to Line-13). This calculation is finally followed by estimation of accuracy performance in later 
stages. A closer look into the algorithm formation will show that proposed system uses unsupervised learning 
algorithm in order to address the uncertainty problems associated with detection of patterns exhibited by 
anomaly detection. Therefore, applying the approach of multivariate analysis over the matrix decomposition, 
the proposed algorithm ensures the presence of positive elements always depicting case of non-outliers. 
Presence of any form on non-positive elements is the only possible case of outliers. Therefore, the proposed 
system is easier to implement and requires lesser number of iteration in order to reach its convergence phase. 
Another significant advantage of this algorithm is its clustering characteristics that involuntarily perform 
clustering of the columnar data of the vector v. In short, the proposed system harnesses the potential of 
sparsity-based matrix decomposition concept using multivariate analysis to obtain unique numerical features 
that have enough information to assist in identification of outliers. The matrix holds enough information 
without causing any forms of degradation to system performance. Another uniqueness of this algorithm is 
lesser variable dependencies will also reduce lesser resource dependencies with faster computational time. 
The next section discusses about the results being obtained. 
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Algorithm-3 for outlier detection 
Input: Baia (Extracted block), Dictionary 
Output: Detection of outliers of an object 
Start 

1. Input Dictionary 

2. For j=1: size (Bdict) 

3. Apply Algorithm-1 

4. Mpia Dictionary! (j).X 

5. prob >(Mpia-(a, size (Mpia)))? 

6. Epeo=\(Y prob) 

8. bw>Dim>1 

9. GT=proc(GT) 

10. Probus €Q(GT, bw, Dim) 

11. If Probmg>1.6 

12. Anomaly Detected 

13. End 

End 


3. RESULT ANALYSIS 

The implementation of the algorithms for achieving the research goal of outlier detection has been 
carried out using pedestrian data of UCSD on Matlab. The dataset consists of approximately 6800 image 
sequences where various pedestrians are found to be walking in different directions on the given path. The 
anomaly object will be an individual cycling along the path of a pedestrian. Similarly, the test-images are 
four times the number of train images along with ground truth images. The visual outcomes of the 
unsupervised training operation are as shown below in Table 1. 


Table 1. Sample Visual outcomes of Training 
set Frame-1 Frame-50 i Frame-100 


10 


20 


30 


Table 1 shows the sample visual outcomes of 1‘, 50", and 100™ frame for 1“, 10", 20", and 30" 
training dataset. The complete training of 6800 images took approximately 0.7621 seconds in the core-i5 
processor along with the extraction of the features. While performing the training, all the features for any 
mobile objects are extracted and subjected to the next phase of algorithm implementation, i.e., testing. The 
complete training is carried out considering parameter of sparse regularization of 10 with a block size of 5x5. 
The possible challenging situation is to ensure the detection of anomaly object, i.e., a cyclist or a person 
moving with cart, etc. This is challenging as the dataset has different numbers of objects (person) moving at 
different speed, in a different direction, with different mobility patterns, where no specific pattern can be 
drawn from this. Therefore, this case study exactly matches with the real-time scenario. However, the 
proposed system solves this problem by undertaking the blocking operation that allows explicit extraction of 
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a feature from specific blocks of the image. This operation assists in the significant formulation of 
multivariate analysis of different coefficients extracted from the object in such a linear pattern that the 
decomposed matrix, as well as original matrices, do have only positive elements. This concept used in 
training has one dominant advantage, i.e., identification of any anomaly object during the training is 
maintained in a different matrix which bears separate index of negative elements. Hence, the system proposes 
a supermatrix, where one matrix holds only non-anomalous information, where other holds only indexes of 
the cell position mapping with anomalous information. This concept of unsupervised training not only trains 
faster but also offer enhanced accuracy performance with 90% of memory efficiency as the matrix only 
stores the indices of anomalous objects. For better inference of the study outcome, the proposed system is 
also compared with one of the most relevant studies by Cong et al. [44] as shown in Figure 2. 

The work carried out by Cong et al. [44] have worked towards addressing a similar problem, i.e., 
detection of the anomaly from the video for assisting in event detection system. However, the authors have 
used the segmentation-based approach on the similar database. However, we hypothetically compare the 
theoretical outcomes of the existing system with a proposed system concerning conventional accuracy 
parameters, e.g., recall, precision, specificity, and Fl-Score to find that proposed system offers better 
accuracy in identification of outliers concerning existing system. 


150.00% 


09.86% 99.92% 


100.00% 1 a" 6% 86 p 8%, 
50.00% B ği. i TE 300 MU. yy 
0.00% 


Recall Precision Specificity F1-Score 
Perfomance Parameters 


Accomplished score (%) 


Existing "Proposed #lmprovement 


Figure 2. Comparative Outcomes of Accuracy 


4. CONCLUSION 

This paper presents a novel framework that emphasizes on the contextual information of a scene. As 
a scene can have multiple numbers of heterogeneous contexts, hence we apply multivariate analysis to perform 
matrix decomposition. The presented technique significantly assists in identifying an outlier that is also 
capable of extracting all the contextual features. An algorithm is designed for blocking operation, unsupervised 
learning using sparsity factor, and finally, perform identification of the objects. 
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